An Efficient Storm Identification from Big Rainfall Data Using MapReduce
نویسندگان
چکیده
This paper is part of my doctoral dissertation, “Analysis and Modeling Techniques for Geo-Spatial Datasets,” which focuses on how to summarize, model, and format spatiotemporal data for analysis and mining. The dissertation consists of four main components: (1) spatio-temporal knowledge representation, (2) identifying meaningful concepts from raw data, (3) converting raw data to conceptual data, and (4) analysis and mining of conceptual data. This paper, part of the third component, describes an efficient MapReduce algorithm for converting raw rainfall data into meaningful storm information, which can then be easily analyzed and mined. Our previous work proposed a method to identify relevant storm characteristics from raw rainfall data. The original storm identification system takes too long to produce the summarized storm characteristics, because: (1) the raw rainfall data, which is considered as big data, is stored in a traditional relational database based on CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.) ODM (Observations Data Model), which leads to substantial disk I/O; (2) the storm identification algorithm is based on recursion and regular depth-first-search (DFS), which leads to multiple retrievals for parts of the data. In this paper, we obtain a substantial improvement in performance by utilizing MapReduce. We also utilize the original raw rainfall data text files instead of using the data in the relational database. In our experiments, the performance of the new storm identification system is significantly improved compared to the previous one. With this new system, it will dramatically benefit hydrologists in helping them performing rainfallrelated analysis (both location-specific and storm-specific) such as flood prediction using our identified storms. Keywords-storm analysis; rainfall; big data; MapReduce; distributed computing; CUAHSI
منابع مشابه
Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAn Improved Performance Evaluation on Large-Scale Data using MapReduce Technique
Abstract: In a day-to-day life, the capacity of data increased enormously with time. The growth of data which will be unmanageable in social networking sites like Facebook, Twitter. In the past two years the data flow can increase in zettabyte. To handle big data there are number of applications has been developed. However, analyzing big data is a very challenging task today. Big Data refers to...
متن کاملA Novel Approach for Identification of Hadoop Cloud Temporal Patterns Using Map Reduce
− Due to the latest developments in the area of science and Technology resulted in the developments of efficient data transfer, capability of handling huge data and the retrieval of data efficiently. Since the data that is stored is increasing voluminously, methods to retrieve relative information and security related concerns are to be addressed efficiently to secure this bulk data. Also with ...
متن کاملCollaborative Filtering Recommendation using Matrix Factorization: A MapReduce Implementation
Matrix Factorization based Collaborative Filtering (MFCF) has been an efficient method for recommendation. However, recent years have witness the explosive increasing of big data, which contributes to the huge size of users and items in recommender systems. To deal with the efficiency of MFCF recommendation in the context of big data challenge, we propose to leverage MapReduce programming model...
متن کاملEfficient Entity Maching over Multiple Data Sources with MapReduce
The execution of data-intensive tasks such as entity matching on large data sources has become a common demand in the era of Big Data. To face this challenge, cloud computing has proven to be a powerful ally to efficient parallel the execution of such tasks. In this work we investigate how to efficiently perform entity matching over multiple large data sources using the MapReduce programming mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013